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We present an efficient implementation of ttie Perdew-Burke-Ernzerliof tiybrid functional PBEO 
witliin tlie full-potential linearized augmented-plane-wave (FLAPW) method. The Hartree-Fock 
exchange term, which is a central ingredient of hybrid functionals, gives rise to a computationally 
expensive nonlocal potential in the one-particle Schrodinger equation. The matrix elements of 
this exchange potential are calculated with the help of an auxiliary basis that is constructed from 
products of FLAPW basis functions. By representing the Coulomb interaction in this basis the 
nonlocal exchange term becomes a Brillouin-zone sum over vector-matrix-vector products. The 
Coulomb matrix is calculated only once at the beginning of a self-consistent-field cycle. We show that 
it can be made sparse by a suitable unitary transformation of the auxiliary basis, which accelerates 
the computation of the vector-matrix-vector products considerably. Additionally, we exploit spatial 
and time-reversal symmetry to identify the nonvanishing exchange matrix elements in advance and 
to restrict the k summations for the nonlocal potential to an irreducible set of k points. Favorable 
convergence of the self-consistent-field cycle is achieved by a nested density-only and density-matrix 
iteration scheme. We discuss the convergence with respect to the parameters of our numerical scheme 
and show results for a variety of semiconductors and insulators, including the oxides ZnO, EuO, 
AI2O3, and SrTiOs, where the PBEO hybrid functional improves the band gaps and the description 
of localized states in comparison with the PBE functional. Furthermore, we find that in contrast to 
conventional local exchange-correlation functionals ferromagnetic EuO is correctly predicted to be 
a semiconductor. 



PACS numbers: 71.15.Ap, 71.15.Mb 

I. INTRODUCTION 

Within the last decades density-functional theory 
(DFT) (Refs.Q and[2]) has evolved into the state of the art 
of electronic-structure calculations. It is usually applied 
within the Kohn-Sham (KS) formalism)^ which maps the 
interacting many-electron system onto a noninteracting 
system with the same density. All exchange and corre- 
lations effects of the many-electron system are incorpo- 
rated into the so-called exchange-correlation (xc) energy 
functional, which is not known exactly and must be ap- 
proximated in practice. The choice of the xc functional is 
the only practical approximation in this otherwise exact 
theory and determines the precision and efficiency of the 
numerical DFT calculations. 

Fortunately, already the local-density approximation 
(LDA)^i^ where the xc energy functional is approx- 
imated locally by that of the homogeneous electron 
gas, gives reliable results for a wide range of materials 
and properties. The generalized gradient approximation 
(GGA) (Refs.H and [TI) goes beyond this approximation 
by incorporating also the density gradient of the inho- 
mogeneous system. Due to its improved accuracy the 
GGA has led to many applications of DFT in quantum 
chemistry. However, there are still many cases where 
LDA and GGA give poor results or are even qualitatively 
wrong. Among these cases are the band gaps of solids, 
the atomization energies, bond lengths, and adsorption 
sites of molecules as well as systems with localized states 
such as transition-metal oxides. During the last decade 



hybrid functionals, which combine a local or semilocal 
xc functional with nonlocal Hartree-Fock (HF) exchange, 
have been shown to overcome these deficiencies to a great 
extentj^^— Hybrid functionals are usually applied within 
the generalized Kohn-Sham (gKS) scheme^^ where the 
HF exchange term leads to a nonlocal exchange potential 
in the one-particle equations. The first hybrid functional, 
a half-and-half mixing of the LDA functional with HF ex- 
change, was proposed by Becke in 1993 Since then var- 
ious ah initio and semiempirical hybrid functionals have 
been published i^ii^"— The PBEO functional)^ on which 
we focus in this paper, does not contain any empirical 
parameters and is thus an ab initio hybrid functional. 

Hybrid functionals for systems with periodic bound- 
ary conditions were first implemented in the late 1990s 
within a basis of Gaussian-type functions and the pseu- 
dopotential plane-wave approachji2ii£ In 2005 Paier et 
ali^ developed an implementation within the projector- 
augmented-wave (PAW) technique. In 2006 Novak et 
al^ proposed an approximate scheme within the full- 
potential linearized augmented-plane-wave (FLAPW) 
approach. There the nonlocal exchange term is evalu- 
ated only in individual atomic spheres and only for se- 
lected I channels. In this paper, we present an efficient 
numerical implementation of hybrid functionals within 
the FLAPW method, which does not suffer from these 
constraints. The FLAPW method provides a highly ac- 
curate basis for all-electron calculations, with which a 
large variety of materials, including open systems with 
low symmetry, d- and /-electron systems as well as ox- 
ides, can be studied. It treats core and valence electrons 
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on an equal footing. 

In the first Hartree-Fock implementation within the 
FLAPW method, Massidda et ai— employed an algo- 
rithm that is routinely used to generate the potential 
created by the electronic and nuclear charges and thus 
solves the Poisson equation.— This Poisson solver can 
also be used for the nonlocal exchange potential because 
its matrix representation involves formally identical six- 
dimensional integrals over space. Instead of the real 
charge one then uses an artificial charge formed by the 
product of two wave functions. Unfortunately, although 
the algorithm is very fast, the Poisson solver must be 
called many times instead of just once when applied to 
the exchange potential, which makes this approach com- 
putationally very expensive. 

In this paper we propose an alternative approach that 
employs an auxiliary basis, the so-called mixed prod- 
uct basis, which is constructed from products of LAPW 
basis functions and consists of muffin-tin (MT) func- 
tions and interstitial plane waves This basis allows to 
decompose the state-dependent six-dimensional integral 
into two three-dimensional and one state-independent 
six-dimensional integral, which is the Coulomb matrix 
represented in the mixed product basis. The Coulomb 
matrix is calculated once at the beginning of the self- 
consistent-field cycle while only the three-dimensional in- 
tegrals must be evaluated in each iteration. In this for- 
mulation the matrix elements of the nonlocal exchange 
potential are evaluated as Brillouin-zone (BZ) sums over 
vector-matrix-vector products. Furthermore, by a suit- 
able unitary transformation, nearly all MT functions 
become multipole-free, which makes the Coulomb ma- 
trix sparse and reduces the computational effort for the 
vector-matrix-vector products considerably. As the ex- 
change interaction is small compared with the other en- 
ergy terms, we introduce a band cutoff as a convergence 
parameter and construct the exchange matrix only in the 
reduced Hilbert space formed by the wave functions up 
to this cutoff. In this way the number of matrix elements 
that must be calculated explicitly is reduced. Because 
of spatial and time-reversal symmetry some of the ex- 
change matrix elements vanish. In order to decide in ad- 
vance, which of the matrix elements will be nonzero, we 
employ a simple auxiliary operator, which has the same 
symmetry properties as the nonlocal exchange potential. 
Additionally, we use group theory to restrict the k sum- 
mations for the nonlocal exchange term to the smaller 
set of k points, which are inequivalent with respect to 



the group of symmetry operations. 

The long-range nature of the Coulomb interaction gives 
rise to a divergence of the Coulomb matrix in the center 
of the BZ leading to a divergent integrand in the ex- 
change matrix elements. In a previous publication we 
showed that the Coulomb matrix in the mixed product 
basis can be decomposed exactly into a divergent and a 
nondivergent parti^ The resulting divergent integrand 
is then given analytically and can be integrated exactly 
while the nondivergent part is treated with standard nu- 
merical integration techniques. We also calculate correc- 
tions beyond the divergent 1 /q'^ term, which are obtained 
from k • p perturbation theory. 

The paper is organized as follows. Section [III gives a 
brief introduction to hybrid functionals. Our implemen- 
tation of the nonlocal exchange potential is discussed in 
detail in Sec. |llll In Sec. |IV] we then apply the PBEO 
hybrid functional to prototype semiconductors and in- 
sulators and discuss the convergence of our numerical 
scheme. Here we focus in particular on oxide materials. 
Section |V] gives a summary. 

II. THEORY 

A hybrid functional E^^^ is a mixture of a standard 
local or semilocal xc functional iJ^c = + with the 
exact nonlocal HF exchange energy E^^ evaluated with 
KS wave functions 

= (f - a) . + aE^"^ + El , (I) 

where a is a mixing parameter with < a < 1. The ad- 
mixture of E^^ is motivated by the adiabatic connection 
theorem^^"— that provides an exact expression for the 
xc functional. This expression becomes identical to the 
HF exchange term in the weakly interacting limit, which 
shows that the nonlocal functional E^^ is a substantial 
ingredient of the xc functional. 

Sometimes a screened exchange term is used instead of 
jj^NL ^ 29^30 jj^ some hybrid functionals one further decom- 
poses E}^ and Ell into the local-density and local-gradient 
parts and mixes them differentlyi^i^ In this paper we fo- 
cus on PBEO (Ref. [H) with the local functionals^ 

El^El^^. El^El^^ (2) 

and the bare HF exchange term 



where the sum runs over the occupied (occ.) KS orbitals ^p'^■^^ of spin a, band index n, and Bloch vector k. Here 
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and in the following by a summation over Bloch vectors 
k or q we mean an integration over the Brillouin zone, 
which is sampled by a finite set of mesh points. The mix- 
ing parameter a = 0.25 was derived from first principles 
in Ref. [H 

Hybrid functionals are typically treated within the gKS 
(Ref. fisl ) leading to a noninteracting system of electrons 
that experience a local as well as a nonlocal potential. 
The one-particle Schrodinger equation in this scheme 
takes the form 

(4) 

with the energy eigenvalues e^^. and the local one-particle 
Hamiltonian 
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(5) 



The effective potential Vcs{r) consists of the external, 
Hartree, and xc potential defined by 



S 



[{l~a)El^ + E^] , (6) 



where the functional derivative is with respect to the 
electron spin density. The nonlocal exchange potential 
derives from Eq. ([3]) and is given by 



occ. BZ a / \ iT* / /\ 
^r-^ ¥'nq(l')</5,^q(r ) 



(7) 



n q 



III. IMPLEMENTATION 

As in a standard DFT approach, the one-particle 
Eqs. (Ill) must be solved self-consistently because the ef- 
fective potential is a functional of the density. Apart 
from this, the HF exchange term of the PBEO hybrid 
functional gives rise to the nonlocal potential in Eq. ([4]), 
which depends on the density matrix and, thus, on the 
occupied states explicitly. This is an important issue in 
reaching the self-consistent solution for both the density 
and the density matrix. Already in DFT calculations 
with local or semilocal functionals the iterative procedure 
does normally not converge, if one uses the complete out- 
put density as input for the next iteration. In general, one 
must apply a mixing scheme for the density (or poten- 
tial), e.g., simple or Broyden mixingj^i^ which produces 
an average density out of the densities of previous itera- 
tions as the input density for the next step. With the ad- 
ditional complication of the nonlocal exchange potential, 
also the density matrix must be mixed in a suitable way, 
in principle. In fact, if we simply use the output density 
matrix for the next iteration, the whole procedure takes 
prohibitively many steps to reach self-consistency. We 
will show in Sec. IIVI that with a simple trick the number 



of iterations can be reduced to that of a normal DFT cal- 
culation without the need for an explicit mixing of the 
density matrix. 

The evaluation of the matrix elements of Eq. ([7]) is by 
far the most time-consuming step in DFT calculations 
with hybrid xc functionals. In fact, it takes much longer 
than any other step in the numerical self-consistent- 
field cycle, even longer than the diagonalization of the 
Hamiltonian matrix, which is the most time-consuming 
part in calculations with local and semilocal functionals. 
The reason for this is the nonlocality of the operator in 
Eq. ([7]), which gives rise to six-dimensional integrals 



T rNL,cr /T.\ 

occ. BZ 

=-EE 

n" q 



(8) 

—, rr d rd r , 



whereas for the local operators in standard DFT calcula- 
tions only three-dimensional integrals must be evaluated. 

In Eq. ([5]) we have represented the exchange operator 
in terms of the wave functions rather than the LAPW ba- 
sis functions. This is advantageous for two reasons: (1) 
if the states n and n' fall into different irreducible sym- 
metry representations, the corresponding matrix element 
is zero and need not be calculated at all (see Sec. IIIID]) . 
(2) Although very important, the exchange energy is a 
relatively small energy contribution compared with ki- 
netic and potential energies. Therefore, we can afford to 
describe the nonlocal exchange potential in a subspace 
of wave functions up to a band cutoff rimax- We only 
construct the matrix for the elements with n,n' < nmaxi 
where rimax is a convergence parameter and the rest is set 
to zero. We will show in Sec. lIVI that the results converge 
reasonably fast with respect to riuiax- This reduces the 
computational demand considerably. 

The sum over the occupied states in Eq. ((S]) involves 
core and valence states. Core states are dispersionless, 
which can be shown to lead to particularly simple and 
computationally cheap expressions for their contribution 
to the exchange term»^ The valence states, on the other 
hand, show a distinct k dependence that must be taken 
properly into account. Here we emp loy the mixed prod- 
uct basis (MPB) (Refs. [13 and that is constructed 
from products of LAPW basis functions. When applied 
to Eq. ([8|) the six-dimensional integral decomposes into a 
vector-matrix- vector product, where the matrix and the 
two vectors are the MPB representations of the Coulomb 
interaction and the two wave-function products, respec- 
tively. The Coulomb matrix is state-independent and 
must only be calculated once at the beginning of the self- 
consistent-field cycle. 

In the following we describe the implementation of 
the nonlocal exchange term in detail. Sections IIII Al 
and IIII Bl introduce the LAPW basis for the wave func- 
tions and the auxiliary MPB for their products, respec- 
tively. In Sec. IIII CI we will show that the Coulomb ma- 
trix can be made sparse, which considerably accelerates 
the vector-matrix-vector multiplications. Furthermore, 
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spatial and time-reversal symmetries are exploited to re- 
duce the computational demand, too, as we will show in 
Sec. IIIIDI The Coulomb matrix diverges in the center of 
the BZ. This divergence gives an important contribution 
to the exchange matrix elements and must be treated 
with care to guarantee a favorable convergence with re- 
spect to the k-point sampling. Sec. IIII El deals with this 
issue. 



A. FLAPW method 

In the all-electron FLAPW method^"— space is par- 
titioned into nonoverlapping atom-centered MT spheres 
and the interstitial region. The core electrons, which 
are predominantly confined to the MT spheres, are de- 
scribed by the fully relativistic Dirac equation. For the 



valence electrons a basis is constructed from plane waves 
in the interstitial region and numerical MT functions 
ufp {r)Yim{r) inside the MT sphere of atom a, where 
Yim{i) denotes the spherical harmonics, f = r/r is a 
unit vector and r is measured from the MT center lo- 
cated at Rq. The function Ui^{r) is the solution of the 
radial scalar-relativistic Dirac equation with the spheri- 
cal average of the spin-dependent effective potential and 
a suitably chosen energy parameter, and ufi{r) is its 
energy derivative. In order to obtain continuous basis 
functions over the whole space a linear combination of 
the MT functions is matched at the sphere boundaries 
to each interstitial plane wave in such a way that the re- 
sulting augmented plane waves are continuous in value 
and first radial derivative. In a given unit cell the spin- 
dependent basis functions with Bloch vector k are then 
given by 



Imax I 1 

T^E E Y.^Zp^^,G)uf-{\r-Ra\)Yirrr{r^a) if r £ MT(a) 



,j(k+G)T 



(9) 



if r ^ MT 



with the unit-cell volume Jl, the number of unit cells 
A^, and reciprocal lattice vectors G. The basis functions 
are normalized over the whole space. For practical cal- 
culations cutoff values for the reciprocal lattice vectors 
I k -|- G I < Gmax ^'iid the angular momentum I < /max a-re 
introduced. For the description of semicore states the ba- 
sis can be augmented by additional functions, so-called 
local orbitalsj^i^^ These are confined to the MT spheres 
and go to zero at the MT sphere boundary. 

In the LAPW basis [Eq. ©] the differential Eq. g]) 
becomes a generalized eigenvalue problem 



E 

G' 



H&G'(k)+aK''^-<^,(k)|c^,(n,k) 



5&G'(k)c£'(n,k), (10) 



where ^qq/ and V^'^qq, are the matrix representations 
of the operators in Eqs. ([SJ and (O, respectively, and 
S'qq, denotes the overlap matrix. The matrix qq, is 
obtained from Eq. ([8|) by multiplying from right and left 
with the inverse matrix of eigenvectors and its adjoint, 
respectively. 



<G'G'(k) = E 



^5^*G"(k)c£"(n,k) 



T rNL,cr /i \ 



5^c^*„(n',k)5a»G'(k) 



G' 



(11) 



B. Mixed product basis 

Our implementation of the exchange potential relies 
on the state-independent MPB, which is designed to rep- 
resent wave-function products. The MPB was already 
explained in detail in a previous publication.— We only 
sketch the main features here. 

The MPB is constructed from products of LAPW ba- 
sis functions, which gives rise to interstitial plane waves 
(IPWs) 
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,«(k+G)-ro 



e(r) 



in the interstitial region with the step function 



e(r) 



, if r e MT 

1 , if r ^ MT 



(12) 



(13) 



and the set of functions u°^(r)wpp, (r)yLj\/(f) with |/ — 
l'\ < L < l + l' and — L < M < L in the spheres. Usually, 
the latter is highly linearly dependent. In order to remove 
these linear dependences we employ a scheme proposed 
by Aryasetiawan and Gunnarsson in Ref. [39|: for each 
atom and LM channel the overlap matrix of this set is 
diagonalized; (nearly) linearly dependent combinations 
can then be identified easily by small eigenvalues. We 
thus obtain a smaller but still sufficiently flexible basis set 
by only retaining those eigenfunctions whose eigenvalues 
exceed a given threshold value (typically 0.0001). Fur- 
thermore, we must add a spherically symmetric constant 
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function in order to isolate the divergent long- wavelength 
limit of the Coulomb interaction (see Sec. IIIIE|) . In the 
case of magnetic calculations the MPB is made spin- 
independent at this stage by taking into account prod- 
ucts of spin-up as well as spin-down radial functions in 
the construction of the overlap matrices. From the re- 
sulting MT functions MaLMp{r) = MaLp{r)YLM{r) we 
then formally construct Bloch functions 



ik-(T+R„ 



K\mp (r) = ^ E ^'^-i^^P - T - Ra ) 

(14) 

where the sum runs over the lattice vectors and 
MaLMp{^) 0, if r is larger than the MT radius Sa- 

As in the case of the LAPW basis introduced in the 
last section, we introduce cutoff values G[^^^ and Lmax 
for the IPWs 



|k + G| < G'^ 
and the MT functions 



< 2G„ 



< 2L 



(15) 



(16) 



We will show in Sec. lIVI that the cutoff values can be cho- 
sen much smaller than twice the corresponding LAPW 
cutoff values although this is the exact limit. 

In contrast to the LAPW basis, the IPWs and MT 
functions are not matched at the MT sphere bound- 
aries but instead simply combined into the full MPB 
{A/j^(r)} = {A/k^^,p(r),Mk(r)}. By construction the 
MT functions are orthonormal. As the IPWs and the 
MT functions are defined in different regions of space, 
they do not overlap. Only the IPWs overlap in a non- 
trivial way. Their overlap matrix Oqq/ is given by the 
Fourier transform of the step function in Eq. (|13p 



O. 



GG' 



= e 



G-G' 



With the 



biorthogonal set 

..LA/p(r),EG'(0'')G^G^G'(r)} 

write the completeness relation as 



we 



Y,\Mf){Mf\=J2\Mf){Mf 



(17) 

can further 



(18) 



which is valid in the subspace spanned by the MPB. It is 
important to note that the MPB is constructed in such a 
way that it describes the wave-function products exactly 
in the basis-set limit. 



C. Sparsity of the Coulomb matrix 

With the help of the completeness relations in Eq. ([T5)l , 
the integral in Eq. ([5]) decomposes into a vector-matrix- 



vector product 



with n, n' < n„ 



BZ 



IJ 



xz;,,;(q)(Af>^„k-qK'k) 



(19) 



where the vectors are the MPB rep- 
resentations of the wave-function products and must be 
calculated in each iteration. The Coulomb matrix 



vuiq) 



(r)A/y(r') 



(20) 



on the other hand, is independent of the wave functions 
and must be constructed only once at the beginning of 
the self-consistent-field cycle. It consists of four distinct 
blocks, the diagonal parts MT-MT and IPW-IPW as well 
as the two off-diagonal parts MT-IPW and IPW-MT, 
which are the complex conjugates of each other. The 
evaluation of the different blocks was discussed in detail 
in a previous publication.— We choose an equidistant k- 
point mesh in order to ensure that k— q is again a member 
of the set. Additionally, it contains the F point, which is 
required for a proper treatment of the divergence of the 
Coulomb matrix around q = (see Sec. IIII Ep . 

The vector-matrix-vector products must be evaluated 
in every iteration for each combination of band indices 
n, n' , and n" as well as Bloch vectors k and q. This 
easily amounts to billion matrix operations or more and 
constitutes the computationally most expensive step in 
the algorithm. The operations would become consider- 
ably faster, if the Coulomb matrix could be made sparse. 
This is, in fact, possible with a simple unitary transfor- 
mation of the MT functions within the subspaces of each 
atom and LM channel. Let us consider two MT func- 
tions with the radial parts MaLiir) and MaL2{r). Their 
electrostatic multipole moments are given by 

f^aLP^ / AW(r)r^+'dr ; P = l,2. (21) 
Jo 

Now we apply the unitary transformation 
1 



X [^J'aLlMaLl{r) + ^aL2Ai"aL2 (r)] 
1 



(22a) 



X [fJ.aL2MaLlir) - MaLl A'/aL2 (r)] , 



(22b) 



which is such that the multipole moment of the second 
function vanishes. With this procedure we can generally 
transform a set of MT functions so that the resulting mul- 
tipole moments vanish for all but one function. For exam- 
ple, out of ten functions we would obtain nine multipole- 
free functions and one with a nonvanishing multipole mo- 
ment. We denote the sets of these transformed functions 
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MT 
(^=0) 



MT 
(^+0) 



IPWs 




MT 
(^=0) 



MT 
(H+0) 



IPWs 



Figure 1: Illustration of the Coulomb matrix after transform- 
ing the MPB as described in the text. The elements that are 
in general nonzero are marked. The matrix is predominantly 
block-diagonal: for each atom and LM channel there is one 
block of multipole-free MT functions (/x = 0) and a larger 
block for the combined set of IPWs and MT functions with a 
nonvanishing multipole moment 7^ 0). Additionally, there 
are very few off-diagonal elements between MT functions with 
= and ^ 0. 



by MT(^=0) and MT(^7^0), respectively. By construc- 
tion, the former does not generate a potential outside 
the MT spheres. This means that Coulomb matrix ele- 
ments in Eq. (j20p involving such a function can only be 
nonzero, if the other function is a MT function residing 
in the same MT sphere; all other matrix elements vanish. 
This leads to a very sparse, nearly block-diagonal form of 
the Coulomb matrix illustrated in Fig. [U where we have 
ordered the MPB according to: MT(/i=0), MT(^7^0), 
IPW. There are onsite blocks (one for each LM channel) 
for the MT(^=0) part and one big block for the combined 
set of the MT(^^O) and the IPWs. There are only few 
off-diagonal elements between MT(/i=0) and MT(/x^O) 
functions at the same atom. Exploiting this sparsity in 
the matrix-vector products drastically reduces the num- 
ber of floating point operations and thus the computa- 
tional cost. 



D. Symmetry 

Spatial and time-reversal symmetries are exploited to 
accelerate the code in three ways: (1) inversion symmetry 
leads to real-valued quantities. (2) If the wave functions 
(p'^y^ and ip'^iy^ in Eq. fall into different irreducible 
representations, the corresponding exchange matrix ele- 
ment vanishes. This can be used as a criterion whether 



an element must be calculated explicitly or not. And (3), 
for each k chosen from the irreducible wedge of the BZ 
the q summation in Eq. is restricted to a smaller set 
of Bloch vectors giving rise to an extended irreducible 
BZ. 

In general, the Coulomb matrix in Eq. (|20p is Hermi- 
tian. If the system exhibits inversion symmetry and the 
MPB functions fulfill the condition /(— r) = /*(r), it be- 
comes real-symmetric. Similarly, the vectors in Eq. (fTO|) 
are then real instead of complex. This reduces the com- 
putational demand in terms of both CPU time and mem- 
ory considerably. However, presently the condition only 
holds for the IPWs but not for the MT functions. We 
thus combine the MT functions of each pair of atoms a 
and —a, which are related via inversion symmetry 



M, 



/k 

.LMP 
1 

71 



(r) 
M. 



(23a) 



aLMP 



M, 



Ik 

{-a)L(-M)P 



(r) 



I 

71 



LMP 



(r) + (-l) 



(r)-(-l) 



L + M 



L+M 



M, 



(-a)L{-M)P 



(r) 



(23b) 



{-a)H~M)P 



(r) 



If the atom is placed in the origin, i.e., the atom indices a 
and —a correspond to the same atom, the transformation 
in Eq. only holds for the integer index M < 0, and 
we define 



M, 



Ik 

:LOP 



(r) = 



• M^LOpi^) ^ 



if L even 
if L odd 



(24) 



for M — 0. It is then easy to show that the transformed 
functions will fulfill the condition above. We note that 
this symmetrization leaves the form of the Coulomb ma- 
trix, as shown in Fig. [Tl intact. 

The great orthogonality theorem of group theory de- 
mands that the matrix elements (Vnkl^l'^'n'k) '-'^ ^^^y °P" 
erator A, which commutes with the symmetry operations 
of the system, are zero, if the wave functions fall into 
different irreducible representations. In particular, this 
holds for the exchange operator in Eq. ([7]). Unfortu- 
nately, the irreducible representations are not available 
in our DFT code and their evaluation in each iteration 
would be computationally expensive. Instead, we exploit 
the fact that the great orthogonality theorem applies to 
any operator that has the full symmetry of the system. 
A suitable operator is given by 



e^T(j.) = i-e(r). 



(25) 



The calculation of its matrix elements {fnkl'^^'^lfn'k/ 
is elementary and takes negligible CPU time. If the ma- 
trix elements between two groups of degenerate wave 
functions are numerically zero, we conclude that these 
two groups belong to different irreducible representa- 
tions. Then the corresponding matrix elements of the 
nonlocal exchange potential in Eq. ([TO)) must be zero, 
too. The question remains whether, conversely, the ma- 
trix elements of Eq. (|25p are always nonzero, if the two 
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irreducible representations are identical. This is not ful- 
filled in only two cases. First, either of the two wave 
functions is completely confined to the MT sphere. This 
can be ruled out since we deal with valence or conduction 
states. Second, the matrix elements are zero by accident: 
the overlaps in the interstitial and the MT spheres ex- 
actly cancel. However, this is extremely unlikely, verging 
on the impossible. We find that the procedure provides 
a fast and reliable criterion to decide in advance, which 
exchange matrix elements are nonzero and must be cal- 
culated explicitly. In this way we again save computation 
time. 

In general, if a symmetry operation, which leaves the 
Hamiltonian invariant, acts on a wave function, it gen- 
erates another wave function with the same energy. In 
other words, the solutions of the one-particle equations 
at two different k points are equivalent, if the k vectors 
are related by a symmetry operation. This can be used 



to restrict the set of k points, at which the Hamiltonian 
must be diagonalized, to a smaller set, whose members 
are not pairwise related. This defines the so-called irre- 
ducible Brillouin zone (IBZ), which is routinely employed 
in calculations with periodic boundary conditions. In a 
similar way, the summation over q points in the nonlocal 
exchange term can be confined, too. However, due to the 
additional dependence on k and k— q, we can only em- 
ploy those symmetry operations that leave the given 
k vector invariant, i.e., P/'k = k -I- Gf, where is a 
reciprocal lattice vector. This subset of operations {P/'} 
is commonly called little group LG(k). In the same way 
as for the IBZ the little group gives rise to a minimal 
set of inequivalent q points, which we denote by the ex- 
tended IBZ [EIBZ(k)]. Leaving out here in the paper the 
description of nonsymmorphic and time-reversal symme- 
tries for simplicity, the exchange potential in the LAPW 
basis can then be written as 



LG(k) EIBZ(k) 



»4.GG'(k) - 1. 1. N^..^JJ \r-r'\ 

^ ^ Nw,q^JJ |r-r'| r r, ( ) 



where iVk,q is the number of symmetry operations that 
are members of both LG(k) and LG(q). As a result, we 
can restrict the q summation to the EIBZ(k) and add 
the contribution of all other q points by transforming 
the final matrix with the operations LG(k) and sum- 
ming. This takes very little computation time because 
a symmetry operation acts as a one-to-one mapping in 
the space of the augmented plane waves as indicated in 
Eq. (j26p . Local orbitals transform in a similar way. This 
is why we apply the symmetrization to qq, instead of 

V^^^ , in which case we would need the irreducible repre- 
sentations again. We note that the whole formalism can 
be easily extended to the case of nonsymmorphic and 
time-reversal symmetry operations. 



In summary, we compute the nonlocal exchange poten- 
tial V^^^(k) in the space of the wave functions, where 
we restrict the q summation to the EIBZ(k) and evalu- 
ate only those band combinations n and n' , which can be 
expected to be nonzero. We then apply the transforma- 
tion in Eq. pip and sum up the different matrix elements 
according to Eq. . 



E. Singularity of the Coulomb matrix 

Due to the long-range nature of the Coulomb interac- 
tion the matrix w/j(q), Eq. (|20|) . is singular at q = 0, 
which leads to a divergent integrand in Eq. p9|) . As the 
divergence is proportional to l/q^, a three-dimensional 
integration over the BZ yields a finite value. However, 
in a practical calculation the q summation in Eq. (|19p is 
not an integral but a weighted sum over the discrete BZ 
mesh. A simple way to avoid the divergence is to exclude 
the point q = from the k-point set. Then all terms in 
Eq. p9|) are finite and the q sum can be evaluated easily. 
We find that this leads to very poor convergence with 
respect to the BZ sampling because the quantitatively 
important region around q = is not properly taken 
into account. Hence, it is advantageous to explicitly treat 
the F point and the singularity of the Coulomb matrix 
there. This is possible by a decomposition of w/j(q) into 
a divergent and a nondivergent parli^^ 

«/j(q) = f^me^''-n{e'''-''\Mj)+v'jj{<i). (27) 

The second term is finite. Its long- wavelength limit re- 
places the matrix vij{0) in Eq. (fT9)l such that the q sum- 
mation can be performed numerically. The divergent first 
term is given exactly in the limit q — )■ because the 
MPB contains the constant basis function explicitly (see 
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Sec. IIIIB[) . Therefore, we may switch to a representa- 
tion with the plane waves e*'* '". The contribution of the 
divergent term is then given by 



div 



■ikl'/'n"k-q' 



/occ. „ 

(e"'-'"<"k-ql<'k)A-d.c.^,(28) 



where d.c. denotes a double-counting correction for the 
finite q points (see below). For the important region 
close to q = we can replace (-I-) by and 5n'n"i 

respectively. We leave out higher-order corrections here 
for simplicity and defer a refined treatment employing k • 
p perturbation theory to App.[Al In order to perform the 
q integration analytically, we replace Ijq^ by the function 



^(q) = E 



G 



j-/3|q+G|" 

|q + GP 



(29) 



which was proposed by Massidda et al. in Ref. [22|. The 
parameter /? > ensures that the BZ integral can be 
extended over the whole reciprocal space. In contrast to 
Ref. [22I we choose this parameter as small as possible such 
that Eq. ([29]) is sufficiently close to Furthermore, 
by this choice terms arising from the product of with 
the first-order term of the exponential function are small 
and can thus be neglected. After inserting Eq. ([29|) in 
Eq. dSH]) we obtain 



(^riri'/nk I 27J.2 



.-/3|q|- 



-d?q 



Nun 



q=iO 



1^ 



(30) 



where the summation over q avoids double counting, 
A'k denotes the number of k points, and f^^^ is the oc- 
cupation number. In order to evaluate the integral and 
sum we introduce a reciprocal cutoff radius go ^^nd finally 
obtain 



div 



1 



E 



.-/3|q|- 



0<<?<<?o ^ 



(31) 



We get rid off the convergence parameter go by relating 
/3 and go by e^^'& = We find that fi = 0.005 is a good 
choice. 

Figure [5] shows the convergence of the exchange energy 
^NL _ 2 Y^n'w ^nn(k) with rcspcct to the k-point sam- 
pling for NaCl. While the separate contributions from 
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Figure 2: (a) Exchange energy as a function of the k-point 
mesh for NaCl. The dashed and dotted curves correspond to 
the divergent contribution Eq. (|3ip and the remaining numer- 
ical sum, respectively. The sum of both is shown by the solid 
curve, (b) Convergence of the exchange energy with (dashed 
curve) and without (solid curve) higher-order corrections at 
q = 0. Please note the different scale of the exchange energy 
in figures (a) and (b). 



the divergent term, Eq. (j3ip . and the remainder converge 
poorly, their sum nearly look constant on the energy scale 
of Fig. [^Ka). As shown in Fig. [^l^b), the k-point conver- 
gence can be improved further by taking corrections at 
q = into account that arise from multiplying 1 /g^ with 
second-order terms of (-I-) (-I-) derived by k • p perturba- 
tion theory (see App. 



IV. CALCULATIONS 

We have implemented the above algorithm in the 
FLEUR program package.— We take the MT functions 
(r) of Eq. ([9]) as the solutions of the radial Kohn-Sham 
equation with the local effective potential derived from 
the full FEE functional and (r) as their energy deriva- 
tives. If necessary local orbitals are employed to describe 



semicore states 



37.38 



In the calculations presented here, 
the core states are taken from a preceding PEE calcula- 
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Figure 3: Convergence behavior of the electron density for Si 
in a self-consistent-field cycle. The solid and dashed curves 
correspond to calculations with and without the nested den- 
sity convergence scheme (see text). 



tion and kept fixed during the self-consistent-field cycle 
with the PBEO hybrid functional. 

The one-particle Eq. pil)l must be solved self- 
consistently, as its local and nonlocal potentials depend 
on the electron density and density matrix, respectively, 
and thus on the solution of wave functions determined 
in Eq. (jlOp . Input and output densities must coincide in 
self-consistency. As a measure of convergence one usu- 
ally considers the root-mean square of the difference be- 
tween the input and output densities An, measured in 
me/bohr^, where e is the elementary charge. We con- 
sider a calculation converged, if this value falls below 
10~^ me/bohr"^. A straight iterative solution of the one- 
particle equation is bound to diverge. In DFT calcula- 
tions with a conventional local functional it is required 
to construct a new charge density for the upcoming self- 
consistency cycle from the current and a history of pre- 
vious densities, as for example, in the standard simple- 
mixing and Broyden-mixing schemes i^^i'^^ However, in 
addition to the local effective potential Eq. (fTO|l contains 
a nonlocal potential, which depends on the density ma- 
trix for which no similarly simple mixing procedure is 
available. Indeed, we find that a standard density mixing 
leads to poor convergence: 27 iterations for Si, as illus- 
trated in Fig. [21 and more than 200 iterations for SrTiOa 
are necessary. On the other hand, the fact that in the 
FLAPW method the basis for the wave functions - and 
hence for the density matrix - depends on the potential 
and changes in each iteration in contrast to that for the 
density makes the definition of a mixing scheme for the 
density matrix difficult, if not impossible. Therefore, we 
employ an alternative pragmatic approach, which leads 
to a surprisingly fast density convergence for all systems 
treated so far. The self-consistency cycle is divided into 
an outer iteration of the density matrix and an inner self- 
consistency step of the density. After the construction of 
the nonlocal exchange potential we keep its matrix rep- 



resentation l^'^GG' fixed and iterate Eq. (|10p until self- 
consistency in the density is reached; only then the ex- 
change potential V^'^qq/ is updated from the current wave 
functions, which starts a new set of inner self-consistency 
iterations. With this nested iterative procedure the outer 
loop converges after eight steps for Si, see Fig. [31 and af- 
ter only twelve steps for SrTiOs. One iteration of the 
inner loop lasts only 1.0 s for Si and 8.3 s for SrTiOa on 
a single Intel Xeon X5355 at 2.66 GHz (Cache 4 MB) 
using a 4x4x4 k-point set. This is negligible compared 
with the cost for the construction of the nonlocal poten- 
tial in the outer loop, which takes 11.9 s for Si and 573.1 s 
for SrTiOs. 

In the following we discuss the convergence of single- 
particle excitation energies and total energy differences 
with respect to the cutoff parameters Lmax and G'^-^^^ for 
the MPB as well as the number of bands 7i,nax, which 
are used to represent the nonlocal exchange potential. 
In Figs. [Ifa) and Hfb) we show the behavior of the ex- 
citation energies of the transitions T25'v ^ Tisc and 
r25''u — Xic for Si as well as Ti^y — >■ T25'c and Ri5'u — > 
R25'c for SrTiOa obtained from the self-consistent so- 
lution of Eq. (Uni) as functions of the convergence pa- 
rameters. The diagrams show that the convergence of 
these transition energies to within 0.01 eV is achieved 



for G' 



2.0bohr^' and G' 



2.7bohr"' for Si 



and SrTiOa, respectively. This is well below the ex- 
act limit G^J^ax = 2Gniax for the wave- function products 
(Gmax = 3.6bohr~^ for Si and Gmax = 4.3bohr~^ for 
SrTiOs). It is even below the reciprocal cutoff radius 
Gmax for the wave functions themselves. The same can 
be said about the cutoff parameter L,nax for the angular 
momentum. Figures [H^a) and [Hb) show that for both 
materials Lmax = 4 is sufficient while the representa- 
tion of wave functions that are properly matched at the 
MT boundaries requires a much larger cutoff value of 
^max = 8. A similar behavior was found for GW calcula- 
tions employing the MPB4i The number of bands rtmax 
that define the Hilbert space in which the exchange po- 
tential is represented can be restricted to only 50 bands 
per atom, which amounts to 100 and 250 bands for Si 
and SrTiOa, respectively. 

Figure [3I^c) shows that the total energy difference be- 
tween the diamond and wurtzite structures of Si con- 
verges even faster than the transition energies above. 
With a reciprocal cutoff radius of GJ^^^^ = 2.25 bohr~^, an 
angular-momentum cutoff of Lniax = 4 and 20 bands per 
atom we achieve an accuracy of 1 mcV, which is one order 
of magnitude smaller than the tolerance for the transition 
energies and well below the error resulting from the BZ 
discretization of the 4x4x4 k-point set. The calculations 
are converged to within 2 meV with a 8x8x8 mesh, with 
which the diamond structure is 112 meV lower in energy 
than the wurtzite structure. This is very close to the to- 
tal energy difference of 92 mcV obtained with the PBE 
functional. 

For the materials treated so far we find that GJ^^^x 
can be chosen universally smaller than Gmax; G'^^^^ = 
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Figure 4: (a) Convergence of tiie T251V Tisc (solid line, left scale) and V251V Xic (dashed line, right scale) transitions for 
Si, (b) of the direct (solid line, left scale) and indirect band gaps (dashed line, right scale) of SrTiOs, and (c) the total energy 
difference Ai? between the diamond (dia) and wurtzite (wz) phases of Si (AS = iJ'wz — i?dia) with respect to the reciprocal 
cutoff value Ginax and the angular momentum cutoff Lmax for the MPB, as well as the number of bands per atom used to 
construct the exchange potential in the space of the wave functions. For these convergence tests we employ a 4x4x4 k-point 
set. See Tables HI and HIl and the text for the fully converged values. 



0.75Ginax as a rule of thumb, while the cutoff param- 
eter Lmax is more material specific. For example, for 
EuO, whose spin-polarized valence and conduction states 
are formed by / electrons, the larger value of £niax — 6 
is necessary for proper convergence, which is still below 
^max = 8, though. Likewise, the optimal number of states 
'T'max is material specific, and thorough convergence tests 



are again necessary. 

For reference, we show transition energies obtained 
with the PBEO hybrid functional for Si, C, GaAs, MgO, 
NaCl, and crystalline Ar in Table HI All calculations are 
performed at the experimental lattice constant with a 
12x12x12 k-point mesh. The PBEO values are converged 
to within 0.01 eV with respect to the convergence param- 
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Table I: PBE and PBEO transition energies in eV for Si, C, 
GaAs, MgO, NaCI, and Ar compared with theoretical and ex- 
perimental values from the literature. All results are obtained 
with a 12x12x12 k-point set. 
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eterS GJnaxi ^maxj and Tlmax- 

and a comparison with recent PAW calculations;^ with 
which we find an overall good agreement; for Si and GaAs 
the values are nearly identical. There are slightly larger 
discrepancies for systems with wider band gaps, which 
we attribute to the different basis sets, because the cor- 
responding PBE values show similar deviations, too. In 
all cases the admixture of exact HF exchange leads to 
an increase in the transition energies in such a way that 
they come close to the measured values. For the materi- 
als shown in Table U there is still a slight underestimation 
of the band gaps for insulators and an overestimation for 
semiconductors. 

In contrast to DFT calculations with a purely local 
effective potential the nonlocality of the exchange po- 
tential does not allow a straightforward calculation of 
band structures, i.e., a diagonalization of the Hamilto- 
nian at an arbitrary point k in the BZ because according 
to Eq. (|19p the construction of V^'^''^(k) would require 
the knowledge of all occupied states at the points k— q, 
where q is an element of the k mesh defined in Sec. IIII CI 
However, these wave functions are, in general, un- 
known. Therefore, we employ the Wannier-interpolation 



We also list the PBE values 
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Figure 5: Comparison of the PBE (dotted lines) and the 
Wannier-interpolated PBEO band structure (solid lines) for 
Si. Energy e is given with respect to the Fermi energy ep. 



technique^"— as realized in the WANNIER90 code^ to 
interpolate the band energies between the ones of the k 
mesh. As an example Fig. [5] shows the PBE and the 
interpolated PBEO band structure for Si, which was con- 
structed with the help of eight sp'^-like maximally local- 
ized Wannier orbitals from the four valence and the four 
lowest conduction bands. The comparison shows that 
the main effect of the nonlocal exchange potential is an 
upwards shift of the conduction bands while the band 
dispersion remains relatively unchanged. There is, how- 
ever, a clear increase in the occupied band width. As in 
the case of PBE the conduction-band minimum lies at a 
point close to but not exactly at the X point. The plot 
of the interpolated band structure thus allows to deter- 
mine the fundamental PBEO band gap of Si easily, which 
amounts to 1.74 eV. It overestimates the experimental 
value of 1.17 eV (Ref.lH) while the PBE value of 0.47 cV 
underestimates it. 

We finally apply the PBEO functional to the more com- 
plex oxides ZnO, SrTiOa, a-Al203, and EuO. The result- 
ing band gaps are given in Table |lll 

For the II- VI semiconductor ZnO it is well known that 
LDA and GGA not only underestimate the band gap but 
also yield wrong occupied d band positions;— they are 
about 3 eV too high in energy compared with experiment. 
As a consequence the Zn d states hybridize strongly with 
the O p states. The wrong d band position relative to 
the p states is commonly attributed to the unphysical 
self-interaction error present in LDA and GGA, which is 
larger for localized than for delocalized electrons. The ad- 
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Table II: PBE and PBEO transition energies in eV for ZnO, 
SrTiOa, AI2O3, and EuO. All results are obtained with a 
8x8x8 k-point set. 
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be metallic, while experimentally it is semiconducting 
with a nearly 100% spin-polarized conduction bandj^ 
a property which makes EuO an efficient spin- filter i^i^ 
LDA+ U has been shown to give the correct electronic 
structure, if the U parameters are chosen properly j^°i^^ 
We find that the parameter-free PBEO hybrid func- 
tional also gives the physically correct semiconducting 
behavior for ferromagnetic EuO even though the calcu- 
lation was started from the metallic ground state ob- 
tained from PBE. The resulting ferromagnetic semi- 
conductor exhibits a theoretical band gap of 1.31 eV, 
which slightly overestimates the experimental value of 
0.9 eVi^ The exchange splitting of the conduction band 
amounts to 1.02 eV whereas experimental measurements 
give 0.6eV»S In summary, the parameter-free PBEO hy- 
brid functional correctly predicts the electronic ground 
state of ferromagnetic EuO in contrast to LDA and GGA. 




Figure 6: Comparison of the PBE and PBEO density of states 
(DOS) for ZnO. The stronger binding of the Zn d electrons in 
PBEO is evident. Energy e is given with respect to the Fermi 
energy ep. 



mixture of exact HF exchange into the hybrid functionals 
partly cancels this error and should therefore lower the 
relative d band position. Figure [6] shows the density of 
states for PBE (lower panel) and PBEO (upper panel) 
again obtained from a Wannier-interpolated denser k- 
point mesh. In fact, the self-interaction correction leads 
to a substantially stronger binding of the Zn d states 
from 5.1 eV in PBE to 6.3 eV in PBEO whereas the ex- 
perimental value is 7.8 eVi^ (The values correspond to 
the center of gravity of the d bands.) Concomitantly the 
d—p hybridization becomes less pronounced leading to 
an increase of the O p valence band width from 4.2 eV 
in PBE to 5.2 eV in PBEO, which deviates from the ex- 
perimental valued by only 0.1 eV. Incidentally, we also 
observe a stronger binding of the d electrons in Ge and 
GaAs. 

The missing self-interaction correction in LDA and 
GGA affects the calculation of /-electron systems even 
more strongly. Ferromagnetic EuO is predicted to 



V. SUMMARY 

We have presented an implementation of the PBEO hy- 
brid functional within the all-electron FLAPW method 
as realized in the fleur code4S The computationally 
most demanding step in the numerical procedure is the 
calculation of the nonlocal exact exchange term, which 
is a central ingredient of hybrid functionals. Our im- 
plementation relies on the matrix representation of the 
Coulomb potential in an auxiliary mixed product basis, 
which is constructed from products of the FLAPW basis 
functions. The nonlocal exchange integrals then decom- 
pose into vector-matrix-vector products. The computa- 
tional cost for these products is considerably reduced by 
a suitable unitary transformation of the mixed product 
basis, which makes the Coulomb matrix sparse. If inver- 
sion symmetry is present, the mixed product basis can 
be defined in such a way that the Coulomb matrix and 
the vectors become real- valued, which again gives rise to 
a speedup of the code. Spatial and time-reversal symme- 
tries are further exploited (1) to identify those exchange 
matrix elements in advance that are zero and need not to 
be calculated, and (2) to restrict the k-point summation 
for the nonlocal quantity to an irreducible wedge of the 
BZ. 

We have demonstrated that the PBEO interband tran- 
sition and total energies converge quickly with respect to 
the parameters of the MPB. Thus, the MPB provides a 
small but accurate all-electron basis for the construction 
of the exchange potential. We have shown that while a di- 
rect iteration of the generalized Kohn-Sham one-particle 
equation needs extensively many steps to converge, a 
nested density-only and density-matrix iteration scheme 
accelerates the convergence of the self-consistent-field cy- 
cle considerably. 

We confirm that the resulting PBEO gap energies for 
a variety of semiconductors and insulators are consis- 
tently closer to experimental measurements than their 
PBE counterparts and compare very well with recent 
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theoretical results^ obtained with the PAW method. In 
addition, we have performed PBEO calculations for the 
oxides ZnO, SrTiOa, AI2O3, and EuO. Again, the band 
gaps are clearly improved compared with PBE. Here we 
focused in particular on ZnO and EuO, which contain d 
and / electrons, respectively. Due to the missing self- 
interaction correction conventional local xc functionals 
are known to fail in describing these localized states prop- 
erly: the occupied d band position of ZnO appear too 
high in energy while EuO is even incorrectly predicted to 
be a metal. The exact exchange potential of the PBEO 
hybrid functional reduces this self-interaction error lead- 
ing to an overall improved description of the relative ener- 
getic positions of the localized and delocalized states: the 
Zn d bands are lowered in energy and thus come closer to 
their experimental position. The reduced d—p hybridiza- 
tion leads to an increase of the O p band width to 5.2 cV, 
which is in good agreement with the experiment. Fur- 
thermore, in contrast to PBE the PBEO hybrid functional 
correctly predicts a semiconducting ground state for fer- 
romagnetic EuO. The band gap and the energy splitting 
are in satisfactory agreement with experiment. 

We note that the numerical procedure presented in 
this paper is not pertinent to the PBEO hybrid func- 
tional and can easily be used for any other hybrid func- 
tional that contains the exact exchange potential, e.g., 
for the popular B3LYP functionalJ^ Furthermore, it al- 
lows a straightforward implementation of more general 
nonlocal potentials, e.g., the HSE functional,—!^ which 
is based on a screened Coulomb interaction. For this 
we just have to replace the Coulomb matrix [Eq. [50] 
by the matrix representation of the screened potential. 
We also note that in this case there is no divergence at 
the BZ center. The orbital-dependent Hartree-Fock term 
can also be used as part of an exchange-correlation func- 
tional (e.g., the exact-exchange functional) within the 
optimized-effective-potential (OEP) methodj^i^ There, 
a local instead of a nonlocal effective potential is derived, 
which requires the solution of the so-called OEP equa- 
tion. In general, the nonlocal exchange term is the first 
term in an expansion of the xc functional with respect to 
the Coulomb interaction strength and thus a central in- 
gredient in an systematic expansion of the xc functional. 
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Appendix A: Terms of the BZ integrand beyond 

Figure [2jb) shows that the k-point convergence can be 
improved considerably by taking into account terms of 
the integrand of Eq. ([28| at q = beyond the l/q^ term 



(Al) 



These terms arise from the expansion 



-2q-r cr 



<k(r) + q-Vq$^.n(r) 



(A2) 



where 



q*r,,k,q(l-) = 2^ —5 Z5 ¥'n',k(r) (A3) 



n' 



^k '^n'k 



is derived from k • p perturbation theory^S^ Inserting 
Eq. ((X2|) into Eq. and using 



(VqV^$^^,k.qK,k) + (^;^„k|VqV^$^„k,q) 

= -2(Vq$^^,k.q|Vj$r^„k,q> (A4) 

inferred from the normalization of Eq. (jA2p , we obtain 

^k,nin2(^) 

/ occ. 

E«.klVqa>^,k,q)(vJ<J'n,k,qK,k) 



(Vq<i>^„k,q|Vj<i>^,,k,q> 4 (A5) 



for /,u,k = fL.M = 1' 

(OCC. 
E«.k|Vq<i>;^,k,q> 
■a 

x(VT$-k_qK^ J)q (A6) 
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for/,H,k = /n„k = 0, and 



K,n,nM = --(Vq$,"i,k,ql</'n.,k>q 



(A7) 



E<<i.k|Vq$^.k.q) X (VT$-^_qK^_J q 
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1 and /, 



0. In the last case the second- 



order term of Eq. (jA2p is neglected. The q integration 



J 



in Eq. ([28)1 finally averages over the angular-dependent 
terms and we are left with 




\ ^ q ^ni 



for /,T„k = /° 
otherwise 



(A8) 



These spherical averages are added to the q = term 
of the numerical integral in Eq. (|28p . We note that the 
\/q term in Eqs. (jA5P and (jA7P exhibit an odd angular 
dependence and thus integrate to zero. 

Recently, Shishidou and Oguchi^ showed that the fact 
that the LAPW basis does not fulfill 



Xk+q.G(r)=Xk.G(r) 



(A9) 



in the MT spheres makes a correction to Eq. (jASj) neces- 
sary, which then becomes 



with ^^,k,q(r) - EGc£("^k)e-'<i-xS+q^G(r)- If 
Eq. (|A9p was fulfilled, the correction term would van- 
ish because then <p5ikq('") ^ fn'ki''^)- ^^'^ correction 
only affects the last term of Eq. (jA5|) and makes the in- 
tegrand exact in the limit q — >■ 0. However, we find that 
it is numerically small and negligible in the cases treated 
so far. 



V,<i>^,,(r) = ^^Y, ,(r) (AlO) 



,i'^n "k n'k 



Vq^^^.k,q(l•) - E(^"'.k|Vq^n,k,q>^"',k(r) 
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